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Abstract: This is an expository paper, focussing on the following scenario. 
We have two Markov chains, M and M! . By some means, we have obtained 
a bound on the mixing time of M' . We wish to compare M with M! 
in order to derive a corresponding bound on the mixing time of M.. We 
investigate the application of the comparison method of Diaconis and Saloff- 
Coste to this scenario, giving a number of theorems which characterize the 
applicability of the method. We focus particularly on the case in which the 
chains are not reversible. The purpose of the paper is to provide a catalogue 
of theorems which can be easily applied to bound mixing times. 
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1. Introduction 



This expository paper focusses on Markov chain comparison, which is an im- 
portant tool for determining the mixing time of a Markov chain. 

* Partially supported by the EPSRC grant Discontinuous Behaviour in the Complexity of 
Randomized Algorithms 
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We are interested in the following scenario. We have two Markov chains, M. 
and M! . By some means, we have obtained a bound on the mixing time of M! . 
We wish to compare M. with M! in order to derive a corresponding bound on 
the mixing time of M. 

The foundation for the method lies in the comparison inequalities of Diaconis 
and Saloff-Coste 0E]. These inequalities give bounds on the eigenvalues of a 
reversible Markov chain in terms of the eigenvalues of a second chain. Similar 
inequalities were used by Quastel jT7j in his study of the simple exclusion process 
on coloured particles. 

The inequalities of Diaconis and Saloff-Coste provide the foundation for ob- 
taining mixing-time bounds via comparison because there is a known close re- 
lationship between the mixing time of an ergodic reversible Markov chains and 
the eigenvalue which is second largest (in absolute value). This relationship was 
made explicit by Diaconis and Stroock [Sj and Sinclair ^| proposition 1]. The 
latter is a discrete-time version of a proposition of Aldous yQ. 

Randall and Tetali ^Hj were the first to combine Diaconis and Saloff-Coste's 
inequalities with the inequalities relating eigenvalues to mixing times to de- 
rive a relationship between the mixing times of two chains. Their result |181 
Proposition 4] applies to two ergodic reversible chains Ai and M.' provided the 
eigenvalues satisfy certain restrictions (see the remarks following Theorem 1101 
below). 

While the inequalities of Diaconis and Saloff-Coste are stated for reversible 
Markov chains, their proof does not use reversibility. 1 The Dirichlet forms cor- 
respond more closely to mixing times in the time-reversible case, but there is 
still some correspondence even without reversibility, as has been observed by 
Mihail [El and Fill [H]. 

The primary purpose of our article is to pin down the applicability of the 
comparison method for non-reversible chains. This is done in Section 0] The 
main result (Theorem I24fl is rather weaker than the corresponding theorem for 
reversible chains (Theorem |SJ) but we give examples (Observation [21 and the 
remark following Theorem I22fl pointing out that the additional constraints are 
necessary. 

Section |31 describes the comparison theorem for reversible chains. The main 
result (Thcorcm|SJl is proved using exactly the method outlined by Randall and 
Tetali |18|. We feel that it is useful to provide a general theorem (Theorem |SJ 
which applies to all reversible chains, including those that do not satisfy con- 
straints on the eigenvalues. Diaconis and Saloff-Coste's method is sufficient for 
this task, provided the construction of the comparison is based on "odd flows" 
rather than just on flows. Observation 1121 shows that the restriction that the 
flows be odd is necessary. The statement of Theorem |H] is deliberately general 
in terms of the parameters e and 5, which are deliberately different to each 
other (unlikely the corresponding theorem in |18|). The reason for the general- 
ity is that the freedom to choose 5 can lead to stronger results, as illustrated by 

1 For non-reversible Markov chains, the eigenvalues of the transition matrix are not neces- 
sarily real, but it is still possible to make sense of "spectral gap" as we shall see in Section 1^1 
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ExampleOH We have included a proof of TheoremEl which is essentially Propo- 
sition l(ii) of Sinclair because, as far as we know, no proof appears in the 
literature. The theorem gives a lower bound on the mixing time of an ergodic 
reversible Markov chain in terms of its eigenvalues. A continuous-time version 
has been proved by Aldous pQ . 

Lemma l^TI m Section|3formalizes a technique that we have found useful in the 
past. In order to use the comparison inequalities of Diaconis and Saloff-Coste, 
one must construct a flow in which the congestion on an edge of the transition 
graph of the Markov chain is small. Lemma [23 shows that it is sometimes 
sufficient to construct a flow in which the congestion on a state is small. 

Finally, we note that Section 5 of Randall and Tetali's paper ^B] surveys 
other comparison methods which are not based on Diaconis and Saloff-Coste's 
inequalities. We will not repeat this survey, but refer the reader to |18| . 

2. Comparing Dirichlet forms 

The following variation of Diaconis and Saloff-Coste's comparison method comes 
from 5 , Section C] . It adapts an idea of Sinclair ^U] . 

2.1. Definitions 

Let M. be an ergodic (connected and aperiodic) Markov chain with transition 
matrix P, stationary distribution tt, and state space f2. In this paper, the state 
space f2 will always be discrete and finite. We will assume that all Markov chains 
are discrete-time chains except where we indicate otherwise. Let E(M) be the 
set of pairs of distinct states (x,y) with P(x,y) > 0. Let E*(M) be the set of 
all pairs (x,y) (distinct or not) with P(x,y) > 0. We will sometimes refer to 
the members of E*(M) as "edges" because they are the edges of the transition 
graph of Ai . Define the optimal Poincare constant of A4 by 



where the infimum is over all non-constant functions <p from to R and the 
Dirichlet form is given by 



Xi(M) = inf 



var^ ip 




and 




Let 




x.y fEfi 
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If TV is the size of fl then let 

Xn-i(M) = mf , 

ip-.n^R var^ <p 

where again the inhmum is over all non-constant functions. When Ai is time 
reversible, the eigenvalues 1 = /3o > /?i > ■ • • > Pn—i °f the transition matrix P 
are real. Then (see Facts |31 and 0] below) Xi(Ai) may be interpreted as the gap 
between /3j and 1, while Ajv-i(A^) is the gap between Pn-i and —1. Although 
this explains the notation, the definitions of Ai and Aat_i make sense even for 
non-reversible Markov chains. 

Suppose that Ai is an ergodic Markov chain on state space £1 with transi- 
tion matrix P and stationary distribution n, and that Ai' is another ergodic 
Markov chain on the same state space with transition matrix P' and stationary 
distribution n' . 

For every edge (x, y) € E*(AA'), let V x , y be the set of paths from x to y 
using transitions of Ai. More formally, let V x , y be the set of paths 7 = (x = 
Xq, x\, . . . , Xk = y) such that 

1. each (xi, a^+i) is in E*(Ai), and 

2. each (z,w) € E*(A4) appears at most twice on 7. 2 

We write I7I to denote the length of path 7. So, for example, if 7 = (xq, . . . , xu) 
we have I7I = k. Let V = ^{ x ,y)&E*(M')Px, v - 

An (Ai, M')-flow is a function / from V to the interval [0, 1] such that for 
every (x,y) € E*(M'), 

J2 m=7r'(x)P'(x,y). (1) 

The flow is said to be an odd (Ai, Ai')-flow if it is supported by odd-length 
paths. That is, for every 76?, either f(-f) — or I7I is odd. 

Let r((z, w), 7) be the number of times that the edge (2, w) appears on path 7. 
For every (z,w) £ E*(Ai), the congestion of edge (z,w) in the flow / is the 
quantity 

A,,M = - n ± r J2 K(*,«0,7)l7l/(7). 

ir{z)P(z,w) ' 

The congestion of the flow is the quantity 

= max A z , w (f). 

(z,w)eE>(M) 

2.2. Theorems 

The following theorems are due to Diaconis and Saloff-Coste Theorem ^ is 
Theorem 2.3 of 0. 

2 This requirement is there for technical reasons — we want to ensure that "Px,y is finite. 
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Theorem 1. Suppose that A4 is an ergodic Markov chain on state space f2 and 
that M! is another ergodic Markov chain on the same state space. If f is an 
(M ,M') -flow then for every <p : O — ► R, £ M ' (</>, <p) < A(f)£ M (fif)- 

Remark. The statement of Theorem 2.3 in 15] requires A4 and M! to be re- 
versible, but the proof does not use this fact. In [5], V x ,y is defined to be the set 
of simple paths from x to y. This is not an important restriction, because a flow 
f can always be transformed into a flow f which is supported by simple paths 
and satisfies A(f') < A(f). We prefer to use our definition so that an odd flow 
is a special case of a flow. 

Theorem 2. Suppose that A4 is an ergodic Markov chain on state space fi and 
that M! is another ergodic Markov chain on the same state space. If f is an 
odd (M,M')-flow then for every <p : f2 — > R, TM'i'-P^) < A(f)J 7 M{ l P, ¥?)■ 

Remark. Theorem 2.2 of J^f corresponds to the special case in which V x ,y 
contains a particular path 7 with f(j) = n'(x)P'(x,y). Again, the statement of 
the theorem requires the chains to be reversible, but the proof does not use this. 
The authors point out that their theorem can be generalized to the flow setting 
as we have done in Theorem^ The same proof works. 

Remark. Note that the definition of an (M., A4')-flow requires us to route 
tt'(x)P'(x, y) units of flow between x and y for every pair (x,y) £ E*(A4'). 
If we do not care whether the resulting flow is an odd flow we can dispense with 
the case x = y. For this case, we can consider the length-0 path 7 from x to 
itself, and we can assign f(j) = n'(x)P'(x, x). The quantity f("f) contributes 
nothing towards the congestion of the flow (since it does not use any edges of 
E*{M)). 

3. Comparing reversible Markov chains 
3.1. Definitions 

The variation distance between distributions 9\ and 62 on Q is 

\\0i-02\\ = IJ2 - = Tn l9l{A) ~ 62{A)V 

i ~ 

For an ergodic Markov chain Ai with transition matrix P and stationary dis- 
tribution 7T, and a state x, the mixing time from x is 

t x (M,s) = min{i > : ||P*'(x,-) -tt(-)|| < £ for all t' > t} . 

In fact, \\P (x, •) — 7r( , )ll is non-increasing in t, so an equivalent definition is 

r x (M,e) - min {i > : \\P*(x, •) - 7r(-)|| < e}. 



Let 



t(M , e) = max t x {M , e) . 
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Let M. be an ergodic Markov chain with transition matrix P, stationary 
distribution n, and state space f2. Let N — Suppose that P is reversible 
with respect to n. That is, every x,y G fl satisfies tt(x)P(x, y) — ir{y)P{y,x). 
Then the eigenvalues of P are real numbers and the maximum eigenvalue, 1, 
has multiplicity one. The eigenvalues of P will be denoted as /?o = 1 > /3i > 
• • > Pn-i > — 1< Let (3 max (M.) = max(/3i, |/3jv-i|)- The spectral representation 
of the transition matrix P plays an important role in the results that follow. 
Let D denote the diagonal matrix D = diag(7r,!/ 2 , . . . , 71"]/^). Then, using the 
definition of reversibility, it is easy to see that A = DPD^ 1 is a symmetric 
matrix. Therefore, standard results from linear algebra (for example, see 
tell us there exists an orthonormal basis {e^ : < i < N — l}of left eigen- 
vectors of A, where is an eigenvector corresponding to the eigenvalue /3j, 
i.e., e^'A = fteW '. We also have that ef ] = it 1 / 2 for j G {0, . . . ,N - 1}. The 
important result we require (see Section 3.2] or (201 Proof of Prop. 2.1] for 
a derivation) is that for neN, 

— N-i 

where the P n (j,k) are the n-step transition probabilities. 

The following facts are well-known from linear algebra and follow from the 
"minimax" (or "variational") characterization of the eigenvalues (see in 
particular the Rayleigh-Ritz and Courant-Fischer theorems). 

Fact 3. Let /3q = 1 > f3\ > • • • > /?at_i > —1 be the eigenvalues of the transition 
matrix of a reversible Markov chain M.. Then 1 — (3\ = \\{AA). 

Fact 4. Let [3q = 1 > f3± > • • • > /3jv-i > — 1 be the eigenvalues of the transition 
matrix of a reversible Markov chain A4. Then 1 + /3/v-i = ^n-i(M). 



3.2. Lower bounds on mixing time 

The following theorem is the same as Proposition l(ii) of ^5] by Sinclair (apart 
from a factor of 2). Sinclair's proposition is stated without proof. Aldous 1 
proves a continuous-time version of Theorem As far as we are aware, there is 
no published proof of the lower bound in discrete time so, for completeness, we 
provide one here based on Aldous's idea. 

Theorem 5. Suppose M is an ergodic reversible Markov chain. Then, for e > 0, 

T(M£) - 1-/WA<) 

Proof of Theorem Let P be the transition matrix of A4 and write the eigen- 
values of P as (3q — 1 > (3\ > ■ ■ ■ > (3n-i > — 1. Let ir be the stationary 
distribution of M.. Let A be the matrix defined in the spectral representation 
of P in Section I3~T1 Let d{n) — max^gfj \ \P n {j, •) — ^(Oll- We first give a lower 
bound on d(2n). 
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Let e' max ) denote an eigenvector (of A) corresponding to /3 max - Since e^ max ^ 
is an eigenvector (and hence not identically zero), there exists some coordinate 
= ( max ) ^ Q Thcn ^ uging p we find 

d(2n)=max||P 2 "(j, .)-^(.)|| 



• ,i (m 

7n with e. 



max 

iefl 2 



IE 



fcen 



/7r(fc) fi) (i) 




n{k) 



N-l 



> 



7r(io) 

2 ^max ^°j y 



i=l 



3 2n e { 't 

Pi e j e k 



N-l 



i=l 



Using this lower bound, we have 

lim inf —4 — - > lim inf - 



(max] 
jo 



In 
ii, . 



ft 



In 
max 



^(4r x) ) 2 >°- 



(3) 



Fix S > 0, and let r* = t(M, §). So, by definition d(r*) < 



For s, i £ N it is known that d(s+t) < 2d(s)d(t). (For a proof of this fact see |2J 
Chapter 2].) Using this inequality we see that d(2kr*) < 2 2k (d(T*)) 2k < 5 2k . 
Then 

d(2kr*) 



lim inf ■ 

K ^max 



< 



hminf 8 2k ft 

k 



2kT 

max 



= lira inf cxp (2fe(ln S — t* In /3 n 



c)) 



(4) 



If In 8 — r*ln/3 max < 0, the lim inf in 10} is 0. This contradicts the lower 
bound in (JjJJ (this lower bound also applies to the subsequence d(2kr*) / /3 2kT 



of d(2n) /(3^ ax ) ■ Thus we conclude that In 8 — r* In (3, 
Finally, assuming /3, 



> 0, or r* > 



In 



1 



A. 



> (otherwise the theorem holds trivially), 

Ax 



ln(l/A-) 
ln(l//3 max )' 





dx 


I 


1 


1 












/)3 m „ * J/5 ma 


X 2 


X 


Ana 





Combining this inequality with the previous one we obtain 



— T\M 



> 



0n 



l-/3„ 



In 



1 



Taking 8 = 2e gives the theorem. 

Corollary 6. Suppose M. is an ergodic reversible Markov chain. Then 

/3max(-M) 



□ 



/9max(X)' 
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3.3. Upper bounds on mixing time 

The following theorem is due to Diaconis and Stroock |SJ Proposition 3] and to 
Sinclair ^| Proposition l(i)]. 

Theorem 7. Suppose that M. is an ergodic reversible Markov chain with sta- 
tionary distribution n. Then 

r * (Me) -i-/Jw n ^W 

3-4- Combining lower and upper bounds 

In the following theorem we combine Diaconis and Saloff Coste's comparison 
method (TheoremsnandGl with upper bounds on mixing time (TheoremCJ and 
lower bounds on mixing time (Theorem |SJ) to obtain a comparison theorem for 
mixing times (TheoremEJ). This combination was first provided by Randall and 
Tetali in Proposition 4 of ^HJ. We use the same reasoning as Randall and Tetali, 
though we consider odd flows in order to avoid assuming that the eigenvalues 
are non-negative. 

Theorem 8. Suppose that M. is a reversible ergodic Markov chain with sta- 
tionary distribution -k and that M! is another reversible ergodic Markov chain 
with the same stationary distribution. Suppose that f is an odd (A4, A4')-flow. 
Then, for any < 5 < \, 



In particular, 



T x (M,e)<A(f) 



r x (M,e)<A(f) 



t(M',8) 
ln(l/2<5) 



In- 



ett(x) 



hi 



£7r(x) 



(5) 



Proof. Let N be the size of the state space. 
1 , 1 



t x (M,s) < 



l-A 



1 



In- 



max 



< max 



1 



\i{M)'\ N -i(M) 
A{f) A(f) 



In 



1 



= A{f)- 
<A(f) 



Xx(M'Y X N -i(M') 
1 . 1 



In 



1 



eir(x) 



t(M',S) 



In 



Ml/26) 



In- 



STr(x) 
1 



(by Theorem EJ) 
(by Facts E| and EJ 
(by Theorems H and EJ 
(by Facts El and EJ 



ett(x) 



(by TheoremEl noting ln(l/2<5) > 0). 



□ 
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Remark. The proof of Theorem goes through if M. and A4' have different 
stationary distributions as long as they have the same state space. However, 
an extra factor arises to account for the difference between var T tp and vaiv ip 
in the application of Theorems^ and\^ Not much is known about comparison 
of two chains with very different state spaces. However, there have been some 
successes. See J^. 

The freedom to choose S in the statement of Theorem[H]is often useful, as we 
see in the following example, based on the "hard core gas" model. 

Example 9. Suppose Q is some class of graphs of maximum degree A. Let 
G € Q be an n-vertex graph, and let flc denote the set of all independent sets 
in G. Let A4' G be an ergodic Markov chain on Qg with uniform stationary 
distribution. Of the transitions of M' G we assume only that they are uniformly 
distributed and local. Specifically, to make a transition, a vertex v of G is selected 
uniformly at random; then the current independent set is randomly updated just 
on vertices within distance r of v. We regard the radius r, as well as the degree 
bound A, as a constant. 

Now let Mg oe another Markov chain that fits this description with r = 
0. That is, it makes single-site updates according, say, to the heat-bath rule. 
Suppose we have proved that t(M g ,s) = 0(nlog(n/e)). (This is a typical form 
of mixing time bound coming out of a coupling argument.) In this example, any 
reasonable choice of flow f will have A(f) = 0(1): the canonical paths are of 
constant length, and a constant number of them flow through any transition 
of M. (To know how the constant implicit in 0(1) depends on A and r, we'd 
need to have more details about the transitions of M! ' , and be precise about 
the flow f .) Note that it is easy to arrange for f to be an odd flow. Applying 
Theorem\$ with the default choice S = l/2e yields t(A4g,£) — 0(n 2 \og(n/e)), 
whereas setting S optimally at 6 = 1/n, we achieve t(A4g>£) — 0(n 2 log(l/e)), 
gaining a factor logn. 

In the literature, the applications of Diaconis and Saloff-Coste's comparison 
method to mixing times are typically presented for the special case in which 
Anax(A4) is the second-highest eigenvalue of the transition matrix of A4. In 
this case, it is not necessary for the flow / to be an odd flow, so the proof of 
Theorem [5] gives the following. 

Theorem 10. Recall that (3\ is the second-highest eigenvalue of the transition 
matrix of M. Theorem\$holds with "odd (M,M')-flow", replaced by "(M, M')- 
flow", provided /3 max (.M) = j3\. 

Remark. Theorem MlA is similar to Randall and Tetali's ineguality ' l 18, Propo- 
sition 4], which assumes that /3 max (A4) and /3 max (A^') correspond to the second- 
highest eigenvalues of the relevant transition matrices, and that the latter is at 
least 1/2. 

Since the restriction that / be an odd flow is usually omitted from applica- 
tions of comparison to mixing in the literature, it is worth considering the fol- 
lowing example, which shows that the restriction is crucial for Theorem [SJ The 
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general idea underlying the example is simple. Let 0q = 1 > j3i > ■ ■ ■ > (3n-i 
be the eigenvalues of the transition matrix of Ai. The eigenvalue (3n-i is equal 
to —1 if Ai is periodic and is greater than —1 otherwise. If this eigenvalue is 
close to —1 then Ai is nearly periodic, and this slows down the mixing of Ai. 
Let Ai' be the uniform random walk on the state space of Ai. Clearly Ai' mixes 
in a single step, but we can construct a (Ai, .M')-flow with low congestion as 
long as we take care to send flow along paths whose lengths are consistent with 
the (near) periodicity of Ai. 

Example 11. Let = {a, b}. For a parameter 5 € (0, 1), let P(a, b) = P(b, a) = 
1 — 5 and P(a, a) — P(b, b) = 5. The stationary distribution t: of Ai is uniform. 
Let t be even. Then 

WP'ia, •) - tt(.)|| > Pr(X t = a \ X = a) - \ > (1 - 6)* - \ > \ - St, 

so r a (A4,l/4) > |_1 / (2<5)J . Let Ai' be the uniform random walk on Q. The 
chain Ai' has stationary distribution tt and mixes in a single step. 

Let P be the transition matrix of Ai and let P' be the transition matrix of Ai' . 
We will now construct a (Ai, Ai')-flow f. 

For the edge (a,b) S E*(Ai') let 7 be the length-1 path (a,b) and assign 
/(7) = 7r(a)P'(a, 6). Similarly, for the edge (b,a), let 7 be the length-1 path 
(b,a) and assign f(p/) = ir(b)P' (b,a). For the edge (a, a) G E*(Ai'), let 7 be the 
path a,b,a and assign 7(7) = ir(a)P'(a, a). Finally, for the edge (b,b) let 7 be 
the path b, a, b and assign f(j) = ir(b)P'(b, b). 

Note that A a>a (f) = A b>b {f) = 0. Also, 

A a , b (f) = «a)P'(a, b) + M o)i"(«, a)) = -L Q + l) = ^ . 

Similarly, Ab ta (f) = 3/(2(1 — <5)). Lf we take 5 < 1/2, we have A(f) < 3. Then 

Mf) t(M',±) + 1 ln(l/\n(a)) 
3 2 In 8 ' 

and by making 5 small, we can get as far away as we want from the inequality 
in Theorem\$ 

Example II II prompts the following observation. 

Observation 12. Ln general, for reversible ergodic Markov chains Ai and Ai' 
with the same state space and stationary distribution, the ratio between t x (Ai , e) 
and the quantity \r(Ai' , 5^) + 1] In S k{ x ) ! rom the right-hand-side of |3|) can not 
be upper-bounded in terms of the congestion of an (Ai, Ai')- flow. (We know 
from Theorem that such a bound is possible if we restrict attention to odd 
flows.) 

It is well-known (see, for example, Sinclair ^Hl) that the eigenvalues of the 
transition matrix P of a Markov chain Ai are all non-negative if every state has 



Ta(Ai,\)> 



1 

25 
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a self- loop probability which is at least 1/2. That is, the eigenvalues are non- 
negative if every state x satisfies P(x, x) > 1/2. Thus, Theorem I1UI applies to 
any such Markov chain M.. Observation^] below shows that even weaker lower 
bounds on self-loop probabilities can routinely be translated into mixing-time 
inequalities without consideration of odd flows. 

Observation 13. Suppose that M. is a reversible ergodic Markov chain with 
transition matrix P and stationary distribution tt and that M! is another re- 
versible ergodic Markov chain with the same stationary distribution. Suppose 
that f is a (A4,M')-flow. Let c — min^ P(x, x), and assume c > 0. Then, for 
any < 8 < 1/2, 



r x (M,e) < max<M(/) 



t(M',8) 
ln(l/2<5) 



1 



2c J e-k{x) 



Proof. Write P = cl+ (1 — c)P. Since the matrix P is stochastic, its eigenvalues 
$i all satisfy < 1. The relationship between the eigenvalues of P and those 
of P is simply /3j = c+ (1 — c)pV In particular pV-i = c+(l — c)/?at_i > — l + 2c. 
By Theorem [5J 



-(m',8) 



> 



1 - /?max(X') 



'4 



or, equivalently, 



By FactGD 



t(M'.S) 

Proax^va ; ^ T (M',S) 

ln(l/2<5) + 



ln(l/2(5) 



and hence, by Theorem Q 

Ai(.M) > 



On the other hand, we know by Fact0]and the lower bound on pV-i calculated 
above that X N -i(M) > 2c. Thus 



l-0 max (M)=min{\ 1 (M),\ N -i(M)} > 



The result follows immediately from Theorem [7] 



1 



2c 



□ 



Suppose that Ai is a reversible Markov chain with transition matrix P. Let 
A4zz be the Markov chain on the same state space with transition matrix Pzz = 
|(7 + P). Mzz is often referred to as the "lazy" version of M.. In the literature, 
it is common to avoid considering negative eigenvalues by studying the lazy 
chain Mzz rather than the chain M, using an inequality like the following. 
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Observation 14. Suppose that Ai is a reversible ergodic Markov chain with 
stationary distribution it and that Ai' is another reversible ergodic Markov chain 
with the same stationary distribution. Suppose that f is a (Ai, M')-flow. Then 

T x (Mzz,e)<2A(f)[T(M',h + l] In— ^. 

2e eir(x) 



Proof. Since the eigenvalues of Mzz are non-negative, 1— /3 max (Aizz) = Ai {Mzz), 
which is equal to ^Ai(.A4). The proof is now the same as that of Theorem|Hl □ 

The approach of Observation 1 1 41 is somewhat unsatisfactory because it gives 
no insight about the mixing time of Ai itself. For example, we can applying 
Observation^] to the chains Ai and Ai' from Example I 111 This gives 

r(Mzz, 1/4) < 2^^y [1 + 1] In(8) = In 8, 

whereas we know from ExamplelTDthat r a (M, 1/4) > [1/(2S)\. Making S < 1/4 
small, we can make Ai mix arbitrarily slowly, while Mzz mixes in at most 17 
steps. 



4. Comparison without reversibility 
4-1. Definitions 

Given a discrete-time ergodic Markov chain Ai on state space JTwith transition 
matrix P and stationary distribution 7r, the continuization Ai is defined as 
follows Chapter 2]). Let Q be the transition rate matrix defined by Q — 
P — I. Then the distribution at time t is v ■ exp(<3t) where v is the row vector 
corresponding to the initial distribution. (For a concise treatment of matrix 
exponentials, refer to Norris Section 2.10].) The mixing time r x (Ai,e) is 
thus 

t x (M,e) = inf {t > : \\v x ■ exp(Qi') — vr| | < e for all t' > t}, 

where v x is the unit vector with a 1 in state x and elsewhere. Denote by 
P l = exp(Qi) the matrix of transition probabilities over a time interval of 
length t. A standard fact is the following (see Norris ^3 Thm 2.1.1]). 

Lemma 15. f t P f = QP* = P*Q. 

The conductance of a set S of states of Ai is given by 



$s(Ai) 



2tt(S)tt{S) 



and the conductance of Ai is &(Ai) = ming &s(Ai), where the min is over all 
Sen with < ir(S) < 1. 
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Suppose S is a subset of f2 with < ir(S) < 1. Let \S be the indicator 
function for membership in S. That is, Xs(%) = 1 if x € 5 and xs( a; ) = 
otherwise. Then since var„- xs = ■k(S)h{S), we have 

var^ xs 

Thus, <&{M) is the same as Ai(yVf) except that in Q(M) we minimize over non- 
constant functions ip : f2 — > {0, 1} rather than over functions from f2 to R. Thus 
we have 

Observation 16. Xi(M) < $(M). 



4-2. Lower bounds on mixing time 

The analogue of Corollary [S] for the non-reversible case will be obtained by 
combining Theorems II 71 1 1 81 and 1191 below . Theorem El is from Dyer, Frieze and 
Jerrum and Theorem 1181 is a continuous-time version of the same result. 

Theorem 17. Suppose that M. is an ergodic Markov chain. Then 



2e 



Proof. It is immediate from the symmetry in the definition of < i>s(A / l) that 
®s(M) — Q-g(A4). Therefore, we can restrict the minimization in the definition 
of $(M) to sets S with < n(S) < \. That is, 

$(M) = min $ S (M). 

Scn,o<7r(S)<| 

Also, since 

iesj'es ieSjes it's ieu 

and 

ien jes ieS jes ies jes 

the two terms in the numerator in the definition of < &s(A^) are identical and 
the definition of &s(M) can be rewritten as follows. 

yivoyi^o Tr(i)P(i, j) 



7r(S)T(50 

Let <J>'(.M) be the asymmetrical version of conductance from Namely, 
&{M) = min $'<j(.M), 

SCO,0<7r(S)<i 
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where 

& S (M) = $ s (M)ir(S). 
In the process of proving Claim 2.3 in Dyer, Frieze and Jerrum prove 



l i_ 

2 2e 



The theorem follows since 



§(M) = min $s(M) > min $ s {M)tt(S) = &{M). 
scn,o<ir(S)<i scn,o<7r(S)<i 



□ 



A very similar bound holds in continuous time. There does not seem to be a 
published proof of this result, so we provide one here, modelled on the proof in 
discrete time [5]. 

Theorem 18. Suppose that M. is an ergodic Markov chain. Then 



<f>{M) > 



2e 



Proof. Suppose <p is an arbitrary function ip : O — ► R. The notation P l p> de- 
notes the function CI — > R defined by [P t ip](x) — '}2 ye ^P t {x^y)if{y) 1 for all 

x e fi. Define p' t : £1 — » R by <p' t (x) — ■^P t (p(x) for all x. By Lemma fTKl 
<p' t (x) = [QPip](x) — [PQ(p](x), where \QPip](x) is the function f2 — > R defined 
by [QP<p](x) = J2 ye c l (QP)(x,y)ip(y) an( i [PQf]( x ) is defined similarly. Define 
9 : R+ -> R+ by 6{t) = \\<p' t \\n,i = E* e n ^MOOI- Now observe that 



|PV(z) 



< / I^WI da. 



o 



ip' s (x) ds 

Multiplying by tt(x) and summing over x we obtain 

||PV-vlki< / lk' s lkids= / 0(a) ds. (6) 
Jo Jo 

As before, denote by \S the indicator function of the set S C VI. We will show: 
(a) 6(t) < 9(0), for all t > 0, and (b) if <p = X s then 9(0) < 2(va r7r p) <£ S (M). 
It follows from these facts and © that 

ll-PV - Vlk.i < 2(var T $(A4)t = 2^(5)^(5) $(M)t, (7) 

where p — Xs an d S C 11 is a set that minimises ( I > s(A / (), i.e., one for which 
<£> S (M) = $(M). Now when t = r(M,l/2e), we know that \P l p(x) - ir(S)\ < 
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l/2e for all x. (Otherwise there would be a state x G D, for which \P t ip(x) — 
n(S)\ > l/2e. But then we would have |P*(a;,S) - tt(5)| = |PV(z) - tt(5)| > 
l/2e, contrary to the choice of t.) Assume without loss of generality that tt(S) < 
\. Now 

tt(^)[-pV](^) = n ( x )<p( x ) = X! 7r ( x )^( a; )' 

so subtracting X^es 7r ( x )[-f , V]( a; ) from both sides (and using the fact that if is 
the indicator variable for S), we get 

£ TrOcXtPVKx) -v>(x)) =X)^)(VW ~ t^VK*))- 

Then 

ll-PV - <P\U,1 > 2 I] 7r(a;)|PV(a:) - <p(x)\ 
> 2^7r(x)(l -tt(5) - l/2e) 

which combines with J7J to yield the claimed result. To complete the proof, we 
need to verify facts (a) and (b). 
By Lemma 1 151 

<p' t {x) = f/M*) =^Q0r,y)[PV](y) 
yen 

= (^P(x, y )[PV](j/)) -[P^](x) 
\ yen / 

= ^P(x,y)([PV](y)-[PV]W)- 

yen 



In particular, if ip — xs an d t = 0, 

0(O) = ||</?o|k,i< J! T(a;)P(i >y )| V j(y)-¥>(a:)|=2£: A ,(v,v) = 2(var» ¥ >)* s (A<), 
by definition of $5, which is fact (b). Fact (a) follows from the following sequence 
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of (in)equalities: 

9{t) = \\<p% A = ^>(*) \p' t (x)\ =J2<x)\[P t Qv\(x)\ 

X X 

= ^,(i)^P t (,, !/ )[Q !) ]( !/ ) 

x y 

<£^)£pW)|[^](y)| 

= EI[^Ky)lE^ t ( ;E ^) 

= 5>G,)|[Qd( tf )| 

y 

= 0(0). 

□ 

The following theorem is known as Cheeger's inequality. 

Theorem 19. Suppose that A4 is an ergodic Markov chain. Then Xi(A4) > 
<£(M) 2 /8. 

Proof. We will reduce to the reversible case, in which Cheeger's inequality is 
well-known. Let PJoe the transition matrix of M. and let 7r be its stationary 
distribution. Let M. be the Markov chain with transition matrix P(x, y) = 
\ (P(x, y) + n(y)/ir(x)P(y, x)) . Now for any <p : Q -> M 

£fo(<P,<P) = \ ^K{x)P{x : y){<p(x) - ip(y)f 



2 



x,y x.y 

This implies both Xi(M) = \i(M) and $(M) = &(M) since these arc just 
minimisations of £j^(ip,(p) and £j^((p,cp) over tp (recall the remark just before 
Observation ITB|) . 

Note that M. is time-reversible since 

n(x)P(x, y) = ^n(x)P(x, y) + -7r(y)P(y, a;) = %(y)P(y, x). 

Now let &'(A4) be the asymmetrical conductance from the proof of The- 
orem 1171 Since A4 is time-reversible, the eigenvalues of P arc real numbers 
0o = 1 > /3i > • • • > @n-i > -1 and from Fact|3]we have l — /3i = \i(M). Now 
by Lemma 2.4 of Sinclair |H] we have Xi(M) = Xi{M) > <&'{M) 2 /2. Also, 

<f>(M) = min $s(A4) < min <f> s (M)2Tr(S) = 2&(M), 

5cO,0<7r(S)<i Scn,0<7r(S)<i 

so <f>'(M) > $(A?)/2 = $(X)/2. □ 
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Combining Theorems El QBI an d El we get the following analogue of Corol- 
lary 

Corollary 20. Suppose that M. is an ergodic Markov chain. Then 

Ai(.M)> 12 2e > 



and 

\i(M) > 



(- - -f 

V2 2e> 



Note that the first lower bound is a function of the discrete mixing^ time 
t(A4, j-) and the second is a function of the continuous mixing time t(A4, w-). 
Otherwise, the bounds are the same. 

Corollary [3U| seems quite weak compared to Corollary because of the squar- 
ing of the mixing time in the denominator. It turns out that our bound cannot 
be improved for the general (non-reversible) case. 

Observation 21. There is an ergodic Markov chain M. with Ai(A / !) G 0(1/ 
r(M,i- e f). 

Proof. Let A4 be the Markov chain described in Section 2 of 0j . This chain has 
state space 

n = {-(n-l),-(n-2),...,-l,0,l,...,n}. 

The transition matrix P is defined as follows. If j = i + 1 (mod In) then 
= 1 — 1/n. Also, if j = —i (mod In) then P(i,j) = 1/n. The sta- 
tionary distribution n is uniform on the 2n states. Diaconis et al. 01 Theo- 
rem 1] show that t(M, ^-) G 0(n). We show that \i(M) G 0(l/n 2 ), implying 
Xi(M) G 0(1/t(M, ^) 2 )- 

Let (p be the function given by p(i) — \i\. We show that £m(¥>, <fi) G 0(1) 
and var w ip G Q,(n 2 ). 

First, the transitions from i to —i preserve <p, so to calculate £m, we need 
only consider the edges from i to i + 1 (over which (p differs by 1). 



Sm{<P,<p) = i£)^(l-l/2n). 

To calculate var„. ip, we observe that 

1 1 />(n-l) 

\i=i i=i / v 



n 
2' 



so (E^) = n 2 /4. Also 



2 2 (n - l)n(2(n - 1) + 1)^ _ 2n 2 + 1 



6 

So var^ = E 7r (< / 3 2 ) - (E^ <p) 2 = ^j 2 - □ 
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4-3. Upper bounds on mixing time 

We now give a continuous-time analogue of Theorem [7| for the general (non- 
reversible) case. Our proof follows the treatment in Jerrum on pages 63+ 
followed by 55-56, with a few details filled in. 

Theorem 22. Suppose that M. is an ergodic Markov chain with stationary 
distribution n. Then 

Proof. Let fi be the state space of M^JLet P be the transition matrix of M, let 
Q = P — I be the transition rate of M. and P* = exp(Qi) as in Section ETT1 Let 
if be any function from f2 to R with tp = 0. If x is chosen from a distribution 
then [P t ip)(x) is a random variable. Note that 'K 7T [P t tp] = 0. 
Using Lemma ll 51 we have 

|var ff [PV] = !^(x)([PV](*)) 2 

= 2Y,<x)[P t V ]{x)j t [P t V ]{x) 

xefi 

= 2XV(x)[J*d(z)[QJ 5 V](aO 

= 2 £ ^(x)Q(x,y)[PV]W[i 5 V](y) 

= 2 £ 7r(x)P(x,y)[PV](^)[PV](y)-2^7r(a ; )[PV]W[-PV]W 

= -2£ M (F t <P,P t <p) 

= —2 - var,r[P y>J 

var^[PV] 

< -2Ai(M)var OT [PV]- 

Now let w denote var T [P*(p] and consider the differential equation j^w < 
-2X 1 (M)w. Solving this we get < -2X 1 (M)dt so lnw < -2Ai(X)t + c and 
w < exp(c — 2Xi(M)t). Plugging in t — we get w < (var w <p) exp(— 2X\{M.)t). 
For a subset A C define ip : £1 — > M by 

, ( 1 - tt(A) if a; € A, 
^ ' y —tt(A) otherwise. 

Note that E„. <p = and var^ ip < 1 so 

var,r[PV] < exp(-2Ai(M)t). 
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Set 




y 



^exp(Qi)(a;,y) - n(A) 



= \Pr(X t e A)-n{A)\, 



for any A. So t x (M,e) < t. 



□ 



Remark. If M is not reversible, it may not always be possible to get a discrete- 
time version of Theorem \2S\ In fact, we cannot always upper-bound t x (A4,e) 
even as a function of both l/\i(A4) and 1/Ajv_i(.M). Here is an example. 
Let M. be the Markov chain which deterministically follows a directed cycle 
of length 3. Both Xi(Ai) and Ajv-i(-M) are bounded above 0. To see this, 
let ip be a function from SI to K with variance 1. Note that £m(lp,lp) and 
J r M( i P, l p) ore non-zero. But M. is not ergodic (so certainly we cannot upper- 
bound its mixing time!). Let M.' denote the uniform random walk on K3. There 
is a low- congestion odd {M,M')-flow f, so l/\i(M) < A(f)/Xi(M') and 
1/Aai-i(.M) < A(/)/Ajv_i(A / ('). These inequalities do not give upper bounds 
on r x (A4,e) because, while they rule out length-2 periodicities in M., they do 
not rule out higher periodicities. 

Let R(A4) be the time- reversal of Ai with transition matrix R(P) given by 



Consider the chain R(A4)A4 which does one step of R(A4) followed by one step 
of A4 during each transition. Here is a discrete-time companion to Theoreml22l 
This is based on Theorem 2.1 of Fill ^Hj- This idea (bounding convergence in 
terms of the Dirichlet form of R(Ai)Ai) is also in |14| . 

Theorem 23. Suppose that M. is an ergodic Markov chain with stationary 
distribution n. Then 



Proof. Let ip be a function from Jl — > R with E w tp — 0. The following equality, 
due to Mihail 2U> is Proposition 2.3 of [iHj . 



R(P)(x,y) 



P(y,x). 
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£r(M)m(<P,<P) = \ ^2 K{x)R{P){x 1 y)P{y,z){(p{x) - ip{z)) 2 

x,y,z 
x,y,z 

= var^ ir(y)P(y, x)P(y, z)tp(x)<p{z) 

x,y,z 

= var^ <p-^2 % (v) P (?/' x )^ x ) z M z ) 

y x z 

= var„. ip — var„- (Pip) ■ 



This gives 



var 7r (P^) = var^ tp - S R ( M ) M (ip, (p) 

V var^ J 

< (l-\i(R(M)M))var„cp 



so 



vaiv(PV) < (1 - Xi(R(M)M)) z var^ ip < exp(~t\ 1 (R(M)M))var^ ip. 
Then we can finish as in the proof of Theorem [221 d 

4-4- Combining lower and upper bounds 

The following theorem follows immediately from Theorem 1221 Theorem ^ and 
Corollary EH 

Theorem 24. Suppose that M. is an ergodic Markov chain with stationary 
distribution n and that M! is another ergodic Markov chain with the same sta- 
tionary distribution. Suppose that f is an (M, M.')-flow. Then 



r«(Me)<4A(/) -;;-"f e / 2 In 

and 



(i - ±\ 2 \e 2 n(x) 



r x {M,e)<AA{f) \ 'f/ 2 In 



As in Corollary 1201 the first inequality gives an upper bound in terms of the 
continuous mixing time t(A4' , The second inequality is the same except 
that the upper bound is in terms of the discrete mixing time t(M.', ^). 

If we use Theorem instead of Theorem |221 we get 
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Theorem 25. Suppose that M. is an ergodic Markov chain with stationary 
distribution n and that M! is another ergodic Markov chain with the same sta- 
tionary distribution. Suppose that f is an (R(A4)A4, M')-flow. Then 



,(M,e)<A(f) 



a 



2e ' In 



2e> 



s 2 tt(x) 



We can do better if M! is known to be reversible. Using CorollarylHI we get 

Theorem 26. Suppose that M. is an ergodic Markov chain with stationary 
distribution tt and that M! is another reversible ergodic Markov chain with the 
same stationary distribution. Suppose that f is an (A4, A4')- flow. Then 



r x (M,e) < 



V 2e 



1 



In 



1 



£ 2 7r(x) 



5. Comparison and state congestion 

This Section generalizes an idea that we used in 0. In order to use the com- 
parison inequalities of Diaconis and Saloff-Coste, one must construct a flow in 
which the congestion on an edge of the transition graph of the Markov chain is 
small. The following lemma shows that it is sometimes sufficient to construct a 
flow in which the congestion on a state is small. 

Suppose that / is an (A4, M')-Haw. The congestion of state z in / is the 
quantity 

Let B(f) = ma,x zen B z (f). Let 

n(f)= max > min \P(z, x), R(P)(w, x 

(«,i O ):A.,„(/)>0 14- 

Lemma 27. Suppose that f is an M. r )-flow. Then there is an {M.,Ai 1 )- 
flow !' with A(f)<8 K (f)B(f). 

Proof. Without loss of generality, we will assume that the flow / is supported 
by simple paths. That is f{^f) = if the path 7 has a repeated vertex. See the 
remark after Theorem ^ 

Let p{z,w,x) denote min {P(z, x), R(P)(w, x)} and let S(z,w) = 
J2xenP( z > w ' x )- Construct /' as follows. For every path 7 = (xq, . . . , x^), route 
the 7(7) units of flow from xo to x^ along a collection of paths of length 2k. In 
particular, spread the flow along 7 from Xi to as follows. For each x e 57, 
route p(xi, Xi+i, x)f{^f)/8{xi, £Ci+i) of this flow along the route Xi,x, 

First we check that /' is an (Ai, 7W)-flow. Note that if p(xi, Xj+i, x) > 
then both P(xi,x) > and R(P)(xi+i,x) > so (since n(xi+i) and tt(x) are 
assumed to be nonzero) P(x, Xj+i) > 0. We conclude that the edges used by /' 
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are edges of E*(M). Also, each edge appears at most twice, as required, since / 
is simple. 

Now we bound the congestion of /'. Let (z, w) be an edge in E*{M). By 
definition, the congestion of edge (z, w) in /' is 

A *^ = W)h*) E KM,7')l7'|/'(7') 

<^- ( — \ E ivi/m (8) 

n(z)P(z,w) yeV ^ w)eY 

But the flow /' was constructed by "spreading" the flow 7(7) on each 7 G V 
over a number of paths 7' with |7'| = 2 1 | as described above. Thus, the right- 
hand-side of JHJ) is at most 



n(z)P(z,w) \ ^— ' S(z,v) 

+ E E 2 M * ( } 

yen ~<eV:(y,w)e~i vy ; 

E N/(7W.,»,«) 



+ E E l7l/(7)K?/. u, v 

j/GO 7 G7 : ':(a,u))e 7 



< r y J E E W/(7)^,' 



tt(w) 

yen 7e -p : (y iU) ) G7 v ' 



< 



4n(f)(B z (f)+B w (f)). 



□ 
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